Data Entry

The Hospital data can be obtained directly from the web or downloaded and then read from a local drive. Both methods are presented below.

In the code block below, the data are read from the web. The URL for the CSV file is copied and pasted from the web page. A header is then requested to be sure that the data are properly read.

  hosp1 <- read.csv("http://facweb1.redlands.edu/fac/jim_bentley/Data/MATH%20111%20Examples/Hospitals/hospitals.csv")
  head(hosp1)
##   hospital condition survival
## 1        A      Good Survived
## 2        A      Good Survived
## 3        A      Good Survived
## 4        A      Good Survived
## 5        A      Good Survived
## 6        A      Good Survived

Reading data from a local drive is done in a similar manner. Below it is assumed that the CSV file is stored in the RStudio project folder.

  hosp2 <- read.csv("hospitals.csv")
  head(hosp2)
##   hospital condition survival
## 1        A      Good Survived
## 2        A      Good Survived
## 3        A      Good Survived
## 4        A      Good Survived
## 5        A      Good Survived
## 6        A      Good Survived

One Categorical Variable

Tables for a single categorical variable are easy to produce using R. A frequency table for the hospital variable in the hosp1 data frame is produced below.

  table(hosp1$hospital)
## 
##    A    B 
## 2100  800

We note that 2100 people went to hospital A and 800 went to hospital B.

To get the proportions and percentages of cases in each category we use the prop.table function.

  prop.table(table(hosp1$hospital))
## 
##         A         B 
## 0.7241379 0.2758621
  print(prop.table(table(hosp1$hospital)), digits=4)
## 
##      A      B 
## 0.7241 0.2759
  prop.table(table(hosp1$hospital))*100 
## 
##        A        B 
## 72.41379 27.58621
  print(100*prop.table(table(hosp1$hospital)), digits=4)
## 
##     A     B 
## 72.41 27.59

The use of the digits option within the print function makes it easy to control how “pretty” the output is.

Similar tables can be computed for the survival variable:

  table(hosp2$survival)
## 
##     Died Survived 
##       79     2821
  prop.table(table(hosp2$survival))
## 
##       Died   Survived 
## 0.02724138 0.97275862

Creating plots is just as easy as generating tables. First note that hospital is a categorical variable. R calls these “factor” variables and will not do arithmetic on them. R’s plot functions are also smart in how they treat categorical data.

We can create a bargraph in a number of ways. Two quick ones are presented below.

  ### Using base plots
  barplot(table(hosp2$hospital))
  ### Using lattice graphics
  p_load(lattice)

  histogram(~hospital, data=hosp1)

  histogram(~hospital, data=hosp1, type="count")

  histogram(~hospital, data=hosp1, type="density")

Note that the option type allows us to change from percentages to counts to density or proportion.

While R will create pie charts, we know that they should be avoided because they are harder to intepret than bar charts. If you must make them, the pie function works.

  pie(table(hosp1$hospital), col=c("skyblue","pink"))

Two Categorical Variables

Two-way, or RxC, frequency tables are equally easy to construct in R. We will use the hospital data from above to look at how they may be generated.

  table(hosp1$survival,hosp1$hospital)
##           
##               A    B
##   Died       63   16
##   Survived 2037  784

Conversion to table, row, and column proportions and percentages are carried out using prop.table.

  ### Table values
    prop.table(table(hosp1$survival,hosp1$hospital))
##           
##                      A           B
##   Died     0.021724138 0.005517241
##   Survived 0.702413793 0.270344828
    print(100*prop.table(table(hosp1$survival,hosp1$hospital)), digits=4)
##           
##                  A       B
##   Died      2.1724  0.5517
##   Survived 70.2414 27.0345
  ### Row values
    prop.table(table(hosp1$survival,hosp1$hospital),1)
##           
##                    A         B
##   Died     0.7974684 0.2025316
##   Survived 0.7220844 0.2779156
    print(100*prop.table(table(hosp1$survival,hosp1$hospital),1), digits=4)
##           
##                A     B
##   Died     79.75 20.25
##   Survived 72.21 27.79
  ### Column values
    prop.table(table(hosp1$survival,hosp1$hospital),2)
##           
##               A    B
##   Died     0.03 0.02
##   Survived 0.97 0.98
    print(100*prop.table(table(hosp1$survival,hosp1$hospital),2), digits=4)
##           
##             A  B
##   Died      3  2
##   Survived 97 98

Statkey generates stacked barcharts. We can generate counts, proportion, and percentage based plots.

  ### Store the frequencies so that the code is easier to read
    freq <- table(hosp1$survival,hosp1$hospital)
  ### Modify the margics to make room for the legend
    par(mar=c(5.1, 4.1, 4.1, 7.1), xpd=TRUE)
  ### Make the plot and add the legend
    barplot(freq, col=heat.colors(length(rownames(freq))), width=2, ylab = "Count", xlab = "Hospital")
    legend("topright",inset=c(-0.25,0), fill=heat.colors(length(rownames(freq))), legend=rownames(freq))

  ### Store the proportions so that the code is easier to read
    prop <- prop.table(table(hosp1$survival,hosp1$hospital),2)
  ### Modify the margics to make room for the legend
    par(mar=c(5.1, 4.1, 4.1, 7.1), xpd=TRUE)
  ### Make the plot and add the legend
    barplot(prop, col=heat.colors(length(rownames(prop))), width=2, ylab = "Proportion", xlab = "Hospital")
    legend("topright",inset=c(-0.25,0), fill=heat.colors(length(rownames(prop))), legend=rownames(prop))

  ### Store the percentages so that the code is easier to read
    perc <- 100*prop.table(table(hosp1$survival,hosp1$hospital),2)
  ### Modify the margics to make room for the legend
    par(mar=c(5.1, 4.1, 4.1, 7.1), xpd=TRUE)
  ### Make the plot and add the legend
    barplot(perc, col=heat.colors(length(rownames(perc))), width=2, ylab = "Percentage", xlab = "Hospital")
    legend("topright",inset=c(-0.25,0), fill=heat.colors(length(rownames(perc))), legend=rownames(perc))

Often it is easier to read side-by-side or lattice barplots.

  ### Load the lattice library
  p_load(lattice)
  ### Barplot of survival frequencies by hospital
  histogram(~survival|hospital, data=hosp1, type="count")

  ### Barplot of survival proportion by hospital
  histogram(~survival|hospital, data=hosp1, type="density", ylab = "Proportion")

  ### Barplot of survival proportion by hospital
  histogram(~survival|hospital, data=hosp1, type="percent")

Less informative in this situation are the plots corresponding to conditioning on rows or survival. Note that the difference between these plots and those above are just the switching of the variables.

  ### Load the lattice library
  p_load(lattice)
  ### Barplot of hospital frequencies by survival
  histogram(~hospital|survival, data=hosp1, type="count")

  ### Barplot of hospital proportions by survival
  histogram(~hospital|survival, data=hosp1, type="density", ylab = "Proportion")

  ### Barplot of hospital percentages by survival
  histogram(~hospital|survival, data=hosp1, type="percent")